13.2 Botryology
163
13.2.2
Principal Component and Linear Discriminant
Analyses
The underlying concept of principal component analysis (PCA) is that the higher
the variance of a feature, the more information that feature carries. PCA, therefore,
linearly transforms a dataset in order to maximize the retained variance while min-
imizing the number of dimensions used to represent the data, which are projected
onto the lower- (most usefully two-)dimensional space.
The optimal approximation (in the sense of minimizing the least-squares error) of
aupper DD-dimensional random vectorbold x element of double struck upper R Superscript upper Dx ∈RD by a linear combination ofupper D prime less than upper DD, < D indepen-
dent vectors is achieved by projecting bold xx onto the eigenvectors (called the principal
axes of the data) corresponding to the largest eigenvalues of the covariance (or scatter)
matrix of the data represented by bold xx. The projections are called the principle com-
ponents. Typically, it is found that one, two, or three principal axes account for the
overwhelming proportion of the variance; the sought-for reduction of dimensionality
is then achieved by discarding all of the other principal axes.
The weakness of PCA is that there is no guarantee that any clusters (classes) that
may be present in the original data are better separated under the transformation. This
problem is addressed by linear discriminant analysis (LDA), in which a transforma-
tion of bold xx is sought that maximizes intercluster distances (e.g., the variance between
classes) and minimizes intracluster distances (e.g., the variance within classes).
13.2.3
Wavelets
Most readers will be familiar with the representation of arbitrary functions using
Fourier series, namely an infinite sum of sines and cosines (called Fourier basis
functions). 7 This work engendered frequency analysis. A Fourier expansion trans-
forms a function from the time domain into the frequency domain. It is especially
appropriate for a periodic function (i.e., one that is localized in frequency), but is
cumbersome for functions that tend to be localized in time. Wavelets, as the name
suggests, integrate to zero and are well localized. They enable complex functions
to be analysed according to scale; as Graps (1995) points out, they enable one to
see “both the forest and the trees”. They are particularly well suited for representing
functions with sharp discontinuities, and they embody what might be called scale
analysis.
The starting point is to adopt a wavelength prototype function (the analysing or
mother wavelet)normal upper Phi left parenthesis x right parenthesisΦ(x). Temporal analysis uses a contracted, high-frequency version
7 Fourier’s assertion was that any2 pi2π-periodic functionf left parenthesis x right parenthesis equals a 0 plus sigma summation Underscript k equals 1 Overscript normal infinity Endscripts left parenthesis a Subscript k Baseline cosine k x plus b Subscript k Baseline sine k x right parenthesis f (x) = a0 + Σ∞
k=1(ak cos kx + bk sin kx).
The coefficients are defined asa 0 equals left parenthesis 2 pi right parenthesis Superscript negative 1 Baseline integral Subscript 0 Superscript 2 pi Baseline f left parenthesis x right parenthesis d xa0 = (2π)−1 { 2π
0
f (x) dx,a Subscript k Baseline equals pi Superscript negative 1 Baseline integral Subscript 0 Superscript 2 pi Baseline f left parenthesis x right parenthesis cosine left parenthesis k x right parenthesis d xak = π−1 { 2π
0
f (x) cos(kx) dx, andb Subscript k Baseline equals pi Superscript negative 1 Baseline integral Subscript 0 Superscript 2 pi Baseline f left parenthesis x right parenthesis sine left parenthesis k x right parenthesis d xbk =
π−1 { 2π
0
f (x) sin(kx) dx.